GAD: General Activity Detection for Fast Clustering on Large Data
نویسندگان
چکیده
In this paper, we propose GAD (General Activity Detection) for fast clustering on large scale data. Within this framework we design a set of algorithms for different scenarios: (1) Exact GAD algorithm E-GAD, which is much faster than K-Means and gets the same clustering result. (2) Approximate GAD algorithms with different assumptions, which are faster than E-GAD while achieving different degrees of approximation. (3) GAD based algorithms to handle the ”large clusters” problem which appears in many large scale clustering applications. Two existing activity detection algorithms GT and CGAUTC are special cases under the framework. The most important contribution of our work is that the framework is the general solution to exploit activity detection for fast clustering in both exact and approximate senarios, and our proposed algorithms within the framework can achieve very high speed. Extensive experiments have been conducted on several large datasets from various real world applications; the results show that our proposed algorithms are effective and efficient.
منابع مشابه
A general framework for efficient clustering of large datasets based on activity detection
Data clustering is one of the most popular data mining techniques with broad applications. KMeans is one of the most popular clustering algorithms, due to its high efficiency/effectiveness and wide implementation in many commercial/non-commercial softwares. Performing efficient clustering on large dataset is especially useful; however, conducting K-Means clustering on large data suffers heavy c...
متن کاملA Hybrid Framework for Building an Efficient Incremental Intrusion Detection System
In this paper, a boosting-based incremental hybrid intrusion detection system is introduced. This system combines incremental misuse detection and incremental anomaly detection. We use boosting ensemble of weak classifiers to implement misuse intrusion detection system. It can identify new classes types of intrusions that do not exist in the training dataset for incremental misuse detection. As...
متن کاملDetection of lung cancer using CT images based on novel PSO clustering
Lung cancer is one of the most dangerous diseases that cause a large number of deaths. Early detection and analysis can be very helpful for successful treatment. Image segmentation plays a key role in the early detection and diagnosis of lung cancer. K-means algorithm and classic PSO clustering are the most common methods for segmentation that have poor outputs. In t...
متن کاملFast Unsupervised Automobile Insurance Fraud Detection Based on Spectral Ranking of Anomalies
Collecting insurance fraud samples is costly and if performed manually is very time consuming. This issue suggests usage of unsupervised models. One of the accurate methods in this regards is Spectral Ranking of Anomalies (SRA) that is shown to work better than other methods for auto insurance fraud detection specifically. However, this approach is not scalable to large samples and is not appro...
متن کاملModification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis
Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009